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Abstract 

The Elo system for rating chess players, also used in other games and sports, was 
adopted by the World Chess Federation over four decades ago. Although not without 
controversy, it is accepted as generally reliable and provides a method for assessing players' 
strengths and ranking them in official tournaments. 

It is generally accepted that the distribution of players' rating data is approximately 
normal but, to date, no stochastic model of how the distribution might have arisen has 
been proposed. We propose such an evolutionary stochastic model, which models the 
arrival of players into the rating pool, the games they play against each other, and how 
the results of these games affect their ratings. Using a continuous approximation to 
the discrete model, we derive the distribution for players' ratings at time t as a normal 
distribution, where the variance increases in time as a logarithmic function of t. We 
validate the model using published rating data from 2007 to 2010, showing that the 
parameters obtained from the data can be recovered through simulations of the stochastic 
model. 

The distribution of players' ratings is only approximately normal and has been shown 
to have a small negative skew. We show how to modify our evolutionary stochastic model 
to take this skcwness into account, and we validate the modified model using the published 
official rating data. 

Keywords: Elo rating system, distribution of rating data, evolutionary stochastic model 

1 Introduction 



The Elo system for rating chess players [Elo86|, named after its creator Arpad Elo, has been 



employed by the World Chess Federation for over four decades as a method for assessing play- 
ers' strengths and ranking them in official tournaments. Although not without controversy, it 
is accepted as generally reliable, and is also used in other games and sports such as Scrabble, 
Go, American football and major league basketball. 

The Elo rating system is based on the model of paired comparisons [Pav88|] , which can be 



applied to the problem of ranking any set of objects for which we have a preference relation. 
The model is particularly useful in that a ranking can be obtained in situations where a 
preference exists only for some of the pairs of objects under consideration. Paired comparison 
models have been successfully applied to measure ability in competitive games and sports 
I Joe9l| , Gli99 |, the most notable example being the widely used Elo system for rating chess 



players. 
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Several extensions to the Elo system have been proposed, notably the Glicko |Gli99| and 



TrueSkill [HMG06] Bayesian rating systems. Both these systems estimate, in addition to the 
rating, the degree of uncertainty that the rating represents the player's true ability. The 
uncertainty allows the system to control the change made to the rating after a game has been 
played. In particular, if the uncertainty is low then the changes made to the rating should be 
smaller as the rating is already reasonably accurate, while if the uncertainty is high then the 
changes made to the rating should be larger. 

Here we adopt the Bradley- Terry model [BT52], which provides the theoretical underpin- 



ning of Elo's model, where the probability pa(3 that a player A, whose strength is a, wins 
against a player B, whose strength is /3, is given by the logistic function Lc{-), namely 

Pafi = Lc{a - ^) = — — , (1) 

1 -I- exp (— C(a — pjj 

where C is a positive scaling factor. We note that Lc{-) is strictly monotonically increasing, 
\\m.x^-ooLc{x) = 0, lim^_>+oo Lc{x) = 1 and Lc(0) = 0.5. Moreover, 

Lc{x) + Lc{-x) = I. (2) 

In this paper we are interested in the distribution of ratings within the pool of players 
that arises as a result of the model induced by (|l]). We are not aware of any research in this 
direction, although it is generally accepted that this distribution is well approximated by a 
Gaussian (i.e. normal) distribution | CG96| , BSMG09 |. It is worth mentioning that Elo | Elo86| 



claimed that the distribution of ratings of established chess players was not Gaussian, and 
suggested the Maxwell-Boltzmann distribution as an alternative that fitted the data he used 
slightly better. 

The rest of the paper is organised as follows. In Section ^ we review the Elo rating system, 
and in Section ^ we do some exploratory data analysis on published official chess rating data. 
We show that the Gaussian distribution provides a very good fit to the data, but there is 
a small negative skew present. In Section ^ we propose an evolutionary stochastic model, 
which as a first attempt assumes a symmetric distribution of ratings. The derivation of the 
distribution is presented in Section ^, where we prove that the resulting distribution is indeed 
normal, with the interesting feature that the variance increases with time in a logarithmic 
fashion. In Section |6| we validate the model using published rating data from January 2007 
to January 2010, and in Section ^ we modify the model to allow for the skewness present in 
the data. With reference to this data, we show through simulation that the modified model 
yields a better approximation to the actual distribution. Finally, in Section ^ we give our 
concluding remarks. 



2 Elo's Rating System 



We now summarise Elo's rating system |Elo86] in order to set in context the evolutionary 
model that we present in Section ^. 

The fundamental assumption of Elo's rating system is that each player has a current 
playing strength. In a game played between players A and -B, with unknown strengths 
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and the score of the game for player A is denoted by Sab, where Sab is 1 if j4 wins, if 
A loses and 0.5 if the game is a draw. Its expected value is assumed to be GJ9E ] 



E{Sab) = Lci^A - ^b), (3) 
where E{-) is the expectation operator and 

C = - — w 0.0058. 4 
400 ^ ^ 

The Elo system attempts to estimate the strength $^ of player A using a calculated 
rating Ra, which is adjusted according to the results of games played by A. We observe that 



this model is related to the Bradley- Terry model for paired comparison data | BT52 |; see also 



|Dav88|. 



After playing a game against player B, player ^'s rating is adjusted according to the 
following formula (see equation (2) in [ |GJ99| ) 



new Ra = old Ra + K{Sab - E{Sab)), (5) 

where K (known as the K-factor) is the maximum number of points by which a rating can 
be changed as a result of a single game. (A high X-factor gives more weight to recent results, 
while a low X-factor increases the relative influence of results from earlier games.) In the 
Elo system the ET-factor is typically between 10 and 30. (There has been some controversy 
involving a recent proposal by the World Chess Federation to change the X-factor | Son09| , 



Zul09|.) For the purpose of experimentation we have fixed the i^-factor at 20. 

When using (^) to update Ra, E{Sab) is estimated from (^ using the current values of 
Ra and Rb as estimates of ^a and ^b, respectively. 

Player B^s rating is updated similarly. We note that, after updating both A's and i?'s 
ratings, the sum of their ratings remains unchanged. The above method can be straightfor- 
wardly extended to the case of a player competing in a tournament, or to a number of games 
played over a given period. 

3 The Distribution of Elo Rating Data 

The World Chess Federation, known as FIDE, publishes a rating list several times each year. 
Traditionally FIDE published the rating list every three months, but from 2009 has moved to 



bi-monthly publication; the official rating data can be obtained from http : //ratings .fide 



co: 



Here we are interested in the distribution of the players' ratings. It has been confirmed by 
Charness and Gerchak | CG96 |, and by Bilalic et al. [lBSMG09 | that the distribution is well 



approximated by a Gaussian distribution. We recall that the probability density function for 
a Gaussian random variable X takes the form. 



where n is the mean and a is the standard deviation of X. 
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Figure 1: Plots of the binned rating data (left) and the fitted Gaussians (right) from January 
2007 to January 2010 



With these observations in mind, we performed some exploratory data analysis on the 
FIDE rating data from January 2007 to January 2010. To test the normality of the data, we 
binned each of the four data sets, taking the bin width to be 20 (the fixed ii'-factor). The 
resulting plots for January 2007 to January 2010 are shown on the left-hand side of Figure |l|. 

We then fitted a constant multiple of a Gaussian distribution to each of the four data sets, 
using Matlab. The plots for the fitted data for January 2007 to January 2010 are shown on 
the right-hand side of Figure |l|. The fitted parameters, Q, /i and a, are shown in Table |l|, 
where Q is the multiplicative constant. Clearly, Q is an approximation to the actual number 
of players P. Table || also shows B?^ the coefficient of determination [ Mot95 ]. It can be seen 
that this is close to 1, which indicates a very good fit. For comparison, the last two columns in 
the table show the mean fi and standard deviation a computed from the actual FIDE rating 
data. On average a is about 7-15 Elo points greater than the fitted standard deviation a. 



Year 


Q 


^^ 


a 




P 




a 


2007 


75167 


2096.400 


151.604 


0.9936 


77056 


2100.127 


166.203 


2008 


84844 


2077.400 


166.452 


0.9908 


87075 


2073.566 


181.918 


2009 


97070 


2034.000 


183.706 


0.9859 


99223 


2044.687 


196.639 


2010 


107874 


2007.600 


202.092 


0.9815 


109373 


2015.650 


209.622 



Table 1: The parameters for the fitted Gaussians in Figure |T] 



It can be seen that the plots on the left-hand side of Figure || appear to show a small 
negative skew. (We note that this is in contrast to the positive skew of the Maxwell-Boltzmann 
distribution, suggested by Elo | Elo86[| .) As a next step, we therefore investigated the skewness 
of the data for 13 rating periods from October 2006 to September 2009. The skewness s is 
defined by 

E{X-fif 



The skewness of the actual FIDE rating data is shown in the left-hand plot in Figure |^. 
As can be seen, it shows that there is a small negative skew, which has generally slowly 
increased over the period. (The increase in skewness in September 2009 is mostly due to 
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Figure 2: Skewness of rating (left) and mean rating linear fit (right), for 13 periods from 
October 2006 to September 2009 

FIDE temporarily lowering the minimum rating for new players from 1400 to 1200, and then 
reverting to the original policy in the following period.) The negative skew can be attributed 
to the slow decrease in the mean rating with the growing number of players, since it is more 
likely that a new player joining the pool will enter with a rating lower than the average. This 
can be formalised as follows. 

Let Pi be the number of players in the pool at the end of the first period and let /ii be 
the mean rating of those players. We define P2 and ^2 similarly for the second period. Then 
the total of the ratings of all players in the pool is for the first period and P2H2 for the 
second period. Assuming the average rating of new players joining during the second period 
is /ii — e, we have 

P2^2 = Pi/ii + (P2-Pi)(m-e), 

yielding 

<P2 - Pi) 

^^2-^ll = B ■ y^) 

We can approximate (|^) by the differential equation 

dn € 
dP ~ ~P' 

which has the solution 

= /^o - e In P, (8) 

where fiQ is a constant. 

The right-hand graph in Figure Q shows the mean Elo rating Jl plotted against the loga- 
rithm of the number of players P. The linear fit shown is in good agreement with (P), with 
e = 227.6, fiQ = 4663, and i?^ = 0.9964. Thus the average rating is decreasing slowly as a 
linear function of the logarithm of the number of players in the pool. In addition, knowing 
e would allow us to predict the rate of decrease, and also to estimate the skewness shown in 
the left-hand graph in Figure |2|. 

As we have seen above, the Gaussian distribution is a good first approximation. We 
pursue this further in Section ^ after we formalise the evolutionary model for players' ratings 
in Section ^ We return in Section ^ to a more general model that takes skewness into account. 
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4 An Evolutionary Urn Transfer Model 



In our evolutionary stochastic model for rating game players, two main types of event may 
take place. The first event type occurs when a new player enters the system. We make two 
assumptions related to such an event: 

(i) that new players enter the system at a fixed rate, and 

(ii) that once players enter the system they do not leave it. 

(We note that the model can be extended to allow players to leave the pool as long as the 
rate at which players enter the pool is greater than the rate at which they leave.) 

The second event type occurs when a game is played between two players. In this case, 
we assume 

(iii) that the outcome of the game is either a win or loss for the first player, and 

(iv) that every game occurs between two players of fairly similar strength; in particular, we 
assume that the absolute value of the difference in strength between the players in any 
game is at most W. 



Assumption (iii) is often made, cf. [ Gli99[ |, to avoid including extra parameters in the 



model, as it is reasonable to assume that a draw is equivalent to half a win and half a loss 
(which is consistent with the score of a draw being 0.5, as in Section ^); see [ |Joe91| , pen92 



|GJ99| for alternative ways of dealing with draws. The basis for Assumption (iv) is that 
players will normally play games against players of comparable strength; for example, many 
tournaments are divided into separate grading sections for that reason. We note that the win 
probabilities given by (|l|) satisfy 

PQ/3+P/3a = l, (9) 

which is consistent with Assumption (iii). 

In our model, we approximate the ratings using a discrete numerical scale of values at 
intervals of /. We use urns to store the pool of players, with each urn containing players of 
approximately similar strength. Let M denote the average rating of all the players. Then 
urrik, the kth urn, where — oo < k < oo, contains those players whose rating is in the range 
[M + (k — 0.5)1, M + {k + 0.5)/), i.e. the players are grouped into bins of width I. Thus a 
player with rating R will be in urn number [0.5 + {R — M) //J . 

Players enter the system at a rate r, where < r < 1. After playing a game, a player 
may stay in the same urn or be transferred to one of the two neighbouring urns, depending 
on the result of the game. We now describe the urn model in detail. 

We assume a countable number of urns, with urriQ being the central urn; to its left are 
the urns with negative subscripts and to its right are the urns with positive subscripts. We 
let Fi{t) denote the number of players in urui at stage t of the stochastic process. Initially 
i = 0, -Fo(O) = A, with A > 0, i.e. utuq initially has A players in it, and all other urns are 
empty, i.e. -Fi(O) = for i / 0. 

When a player enters the system, an existing player A is selected uniformly at random 
from the urns and the new player is put into the same urn as player A, i.e. we assign the new 
player the same approximate rating as the selected existing player A. In other words, new 
players enter the system according to the distribution of players currently in the system. 
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The stochastic process modelhng the changes in rating can be viewed as a random walk 
RG04[| , where the probabihties of players increasing, decreasing or maintaining their ratings 



depend on their current ratings, as explained below. 

At time t, t > 0, a player A is chosen uniformly at random from the urns, say from urnt, 
i.e. urrii is selected with probability 



where ~ means is approximately equal to for large t. (This approximation holds since the 
expectation of the number of players P at time t is rt + A.) 

As above, we assume < r < 1. Then one of two things may occur: 

(I) with probability r, a new player is inserted into urrii, i.e. into the same urn as the 
chosen player A; 

(II) with probability 1 — r, an opponent B for the player A is chosen from urns 
urui-w, urm-w+i, . . . , urrii^i, urrii, urrii+i, . . . , urni+u,-i,urni+u,, 
where w = [W/ 1\ . 

The probability that player B is chosen from urui^g is vrg, —w < s < w, where for 
symmetry we assume ir^s = vr^. Depending on the result of the game, player A either 
moves to urni-i or tirnj+i, or remains in urrii. The probabilities of these events are 
chosen so that the expected change in A's rating is identical to that prescribed by the 
Elo system. 

As we are working in terms of urn numbers rather than Elo ratings, we let c = CI, so c 
is the scaling factor in terms of urn numbers. Thus, since Lc{sl) = Lc{s) and Lc{—sl) = 
Lc{—s), the probability that player A wins is Lc{—s), by (|l|). Therefore, from (|5|) and (|2|), 
when A wins A's new rating is given by 

new Ra = old Ra + K{1 - Ld-s)) = old Ra + KLds). (11) 

In order to find the new urn number new ia for A, corresponding to the rating new Ra, 
we first normalise ( pA] ) by subtracting M and dividing by /, giving 

new iA = old iA+ ( j ^c{s)- 

We restrict player A to moving up or down by at most one urn. Moreover, we discretise 
the change stochastically so that the new urn number will be integral but the expected change 
unaffected. Hence, 

/ 1 with probability (y) Lc(s) 
new lA = old ia + { V ^ / ' ' (12) 

otherwise. 



We note that / has to be chosen so that the probability in (12) does not exceed 1 for all 
s, —w < s < w. We therefore require K < /(I + e"'^'"). For simplicity, we will choose 
I = K = 20. 
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The probability that player A moves to urrii^i is 

i.e. the product of the probability that A wins against B and the corresponding discretisation 
probability. 

Similarly, when A loses we have 

new iA = old iA — ( j Lc{—s). 

Again restricting A to moving up or down by at most one urn, on stochastically discretis- 
ing, we obtain 

,,. J 1 with probability I Lc(—s) 
new iA = old lA- { ■y \ i J j ^^^-^ 

otherwise. 



Therefore the probability that A moves to urui-i is 



i.e. the product of the probability that A loses against B and the corresponding discretisation 
probability. 

Let 

= (y) L,{s)L,{-s) = (^1) L,{s) (1 - L,{s)) . (14) 

Then, in summary, if the selected player A is from urni and the chosen opponent B is 
from urm+s, —w < s < w, 

(i) with probability -0^ player A moves to urni^i, 

(ii) with probability ■0^ player A moves to urui-i, and 

(iii) with probability 1 — 2?/^^ player A stays in urni. 

We note that i/^^ is proportional to the derivative of the logistic function, viz. 



This symmetric bell-shaped curve is proportional to the probability density function of 
the logistic distribution, with standard deviation 7r/c\/3 fEHPOC | . 



It is easy to show that, conditional on A being chosen from urni and B from urni^g, 
the variance of the change in rating is 2ips, whereas with the Elo system it is only {K/I)ips] 
the additional variance is due to the stochastic discretisation. It therefore follows that the 
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unconditional variance in our model will also be increased by a factor of 21 /K compared to 
that for the Elo system. 

It is clear that, according to the Elo model, player S's rating should be updated in a 
similar manner to player ^'s. However, we simplify the analysis by considering each game 
as essentially equivalent to two "half games", since the players are chosen randomly. It is 
therefore sufficient to analyse only the change to A's rating. 

(We note that, unlike the proposal in []GJ99 |, our evolutionary model does not take into 



account, for example, the fact that junior players tend to be under-rated and to improve more 
rapidly than older players.) 

5 Derivation of the Distribution of Players' Ratings 

Considering all possible choices for player B, it follows from the above discussion that the 
probability 9 that A will move to urm+i is given by 

10 

e= J2 ^s^s (15) 



and, by symmetry, that this is also the probability that A will move to urn. 



1-1- 



At time t, i > 0, a game is played with probability 1 — r, and there are then the following 
three possible ways that the contents of urui may change. 

(a) The player A chosen uniformly at random is selected from urrii, and then plays an 



opponent B from say urrii^s- By (15), the probability that A beats B and moves to 
tirnj+i is 9, that A loses to B and moves to urni^i is 0, and that A stays in urrii is 
1 — 29. Thus the net expected loss from urrii is 29. 

(b) The player A chosen uniformly at random is selected from nrnj_i, and then plays an 
opponent B from say urnj_i_|_jj. By (|T5|), the probability that A beats B and moves to 
urrii is 0; so the net expected gain to urrii is 9. (In all other cases the contents of urrii 
do not change.) 

(c) The player A chosen uniformly at random is selected from nrnj+i, and then plays an 
opponent B from say tirnj+i+g. By (15), the probability that A loses to B and moves 



to urrii is 9\ so the net expected gain to urrii is 9. (In all other cases the contents of 
urrii do not change.) 

If A is selected from any of the other urns, the contents of urrii do not change. 

We now obtain the difference equation for the urn transfer model, by considering the 
expected change to urrii., as discussed above. For integer i and t > 0, 

E{Fi{t + 1)) = Fi{t) + ^ (0F,_i(t) + 9Fi+i{t) - 29Fi{t)) + ^^^(t). (16) 



To derive (16), we follow a mean- field theory approach, such as that in |OS01| , |LFLW02 |, 
replacing P by its expectation rt + A, as in (|lO|). The expected value of Fi{t + 1) is equal 
to the previous number of players in urrii plus the two probabilities of inserting a player into 
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urui, from either urrii-i or nrrii+i, minus the probabihty of moving a player from urui to 
either of the neighboming urns, i.e. urrii-i and urnj+i, plus the probability of inserting a 
new player into urrij. 

We now take expectations in ([l6|), and we write Fi{t) for E{Fi{t)). By the linearity of 
£'(•), we obtain 

+ 1) - nit) = (F,_i(t) + F,+i(t) - 2F,(t)) + (17) 



We note that ( p^ ) defines a symmetric random walk by the selected player A at time 
i, where the probability of moving right or left is proportional to 0, but the probability 
that A is selected decreases over time. Thus the distribution of the players in the urns 
flattens asymptotically over time and the standard deviation increases, as in a diffusion process 
| DB03| . 

We will see that in our case the variance increases logarithmically with time and thus the 
distribution will flatten very slowly. 

We now approximate our discrete model by a continuous model using a continuous function 
F{i,t) to approximate Fi{t). In particular, we may approximate 

F^it + 1) - F,{t) by 

and 

F,_i(t)+F,+i(t)-2F,(t) by ^ 
From (p!7[), we thus derive the partial differential equation 

'--V-^-^'-^-ni,i (IB, 



r 



where 

is a constant. 
If we now let 

we can transform (111) into 



il-r) 



(19) 



F{i,t) = {rt + A)G{i,t), 



< + ^)^ = A~. (20) 

r J at oi'^ 

We now transform ( |20| ) into the following simple form of the standard diffusion equation 
(also known as the heat equation) [DB03, RG04], by making the substitution t = A(e^ — l)/r 
and writing H{i,z) for G{i,t): 

dH{i,z) _ ^ d^H{i,z) 

di^ ■ ^^^^ 
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The initial conditions of the discrete model are -Fo(O) = A, where A > 0, and -Fj(O) = 
for i ^ 0. Since 

oo 

j=— oo 

the boundary conditions for the continuous model become F{i,0) = A5{i), where 6{-) is the 
Dirac delta function. This yields the boundary conditions 

H{i,0) = G{i,0) = ^^ = 6{i). (22) 



Equation (|2l|) with boundary conditions (|22|) has the following standard solution: 

H{i,z) = ^l=exp(^], (23) 

and we see from (^) that this is the density function of the Gaussian distribution with mean 
and variance 2Xz. 

From (^3|) it follows that 

6 Modelling the Distribution of Chess Players' Ratings 

In order to run simulations of the model that we described and analysed in Sections ^ and 
respectively, we first need to specify or derive values for the various parameters involved. 

We are assuming that C = In 10/400 and I = K = 20, as stated previously in Sections § 
and ^; thus c = In 10/20. We consider the cases w = 1,2 and 3, and for simplicity we assume 
that the urn from which the opponent B is selected is chosen uniformly, i.e. vr^ = l/(2tt; + 1). 
We can then compute ips from (|l^ and 9 from (15). 



Finally, we need estimates for r, A and t. We assume, as indicated in Section ^, that 
the ratings are normally distributed; we relax this assumption in Section ^ to cater for some 
degree of skewness in the distribution. In order to validate our model, we obtain estimates for 
these parameters using the published official rating data from January 2007 to January 2010, 
as described in Section ^ Our methodology is to extract values for these parameters from 
this data, using the analysis in Section |5[ and then run simulations of our model in order to 
see how closely the resulting distribution matches that obtained from the actual data. 

To estimate r from the actual rating data, we proceed in the following way. Let P be 
the number of rated players recorded at January of a given year. Let G be the number of 
games played and N be the number of new players joining the pool of rated players during the 
previous year (computed as the difference between P and its value for the previous January). 
According to the data, the rate r at which players entered the system during the previous 
year is given by 

N 

r = . 

N + G 

The values for these parameters from January 2007 to January 2010, calculated using the 
official FIDE data, are presented in Table ^ In the simulations we took the rate r to be 
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0.009553, the average rate over the complete four-year period, as shown in the summary row. 
It can be seen from the table that, in reality, r fluctuates somewhat, but as an approximation 
we assume that r is constant. We can then compute A from ([l^). 



Year 


P 


G 


N 


r 


2006 


67349 








2007 


77056 


881089 


9707 


0.010897 


2008 


87075 


1009067 


10019 


0.009831 


2009 


99223 


1181206 


12148 


0.010180 


2010 


109373 


1285607 


10150 


0.007833 


Summary 




4356969 


42024 


0.009553 



Table 2: The data used to compute r 



Lastly we need to obtain values for A and t. From @ and (p^), it follows that at time t 
the expected number of players P is given by 

P = rt + A, (25) 

and that a^, the variance of the rating distribution, is 2Aln(l -|- rt/A). We thus obtain 



A = P exp 



—a 
~2X 



(26) 



To get a single value for A, we simply take the average over the years 2007 to 2010, where 
we compute a year-specific value for A from (26) using the values of P and a from Table |l|. 
Finally, we estimate t using (|25|). 

For w = 1,2 and 3, the estimated values for A and t are presented in Table ^, where 
the values for t are rounded to the nearest 10. We also obtained alternative estimates by 
replacing P by Q in (^) and (|2^); the two alternatives are indicated by the first column of 
Table ^. The alternatives will be denoted by Ap,tp and AQ,tQ, respectively. 



Using 


w 


A 


t until 2007 


t until 2008 


t until 2009 


t until 2010 


P 


1 


20701 


5899130 


6947900 


8219530 


9282010 


Q 


1 


20242 


5749450 


6762420 


8042210 


9173160 


p 


2 


20569 


5912960 


6961730 


8233360 


9295840 


Q 


2 


20113 


5762980 


6775950 


8055750 


9186690 


p 


3 


20373 


5933480 


6982250 


8253880 


9316360 


Q 


3 


19921 


5783070 


6796030 


8075830 


9206770 



Table 3: Derived A and t for 2007 to 2010, for w = 1, 2 and 3 

As mentioned above, we fixed r at 0.009553, the value obtained in Table |2|. For each set 
of values for the parameters w, A and t in Table |^, we ran 10 simulations of the stochastic 
process described in Section ^, implemented in Matlab. In each case we then fitted a Gaussian 
to the distribution of the number of players in the urns, again using Matlab. Each row in 
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Table ^ was computed from the average of the 10 simulations in exactly the same way that 
the values in Table |l] were computed from the actual rating data. That is, P, fl and a are the 
values calculated from the results of the simulations, and Q, jj, and a are the values obtained 
by fitting a Gaussian distribution to the simulation results. (In order to obtain Elo ratings 
from the urn numbers of the players in the simulation, the urn numbers were calibrated by 
means of a suitable shift. This was chosen so that the means p, from Table ^ for each of the 
four years were within the range of urriQ.) It can be seen that, in each row of Table ^, all the 
fitted and calculated values are very close to each other. This and the fact that E? is so close 
to one gives strong confirmation of our analysis in Section ^. 

We now compare the fitted and calculated parameters from Table ^ with those in Table |l|. 
Obviously, by construction, ^ and /I are very close to the corresponding values in Table |l|. In 
addition, it can be seen that the values for P and Q when using Ap and tp axe very close to 
the values for P in Table ||, and correspondingly close to the values for Q in Table || when 
using Aq and tq. However, the calculated standard deviation a in Table |^ is consistently 
lower than its counterpart in Table ||. For 2007 they are very close, for 2008 they are about 
10 Elo points apart, for 2009 they are about 17 points apart, while for 2010 they are about 
24 points apart. Although these results are very encouraging, we will see in the next section 
that we can get much closer to the actual standard deviations by introducing skewness into 
the model. 

7 Taking Skewness into Account 

As discussed in Section |3|, the actual rating data exhibits a small negative skew. We now 
consider modifying the urn model presented in Section |^ to take this into account. Since 
it is likely that a new player will enter with a rating lower than the average, we can model 
this skewness in a simple way by making a small change to the way in which new players 
are added. Instead of inserting the new player into the same urn as the chosen player A, say 
urrii, we put the new player into tirnj-^, where k determines the amount of negative skew 
we wish to introduce. 

To validate the modified stochastic process, we ran a batch of simulations in Matlab, 
starting the process with the actual rating data as of October 2006 and ending in January 
2010. For the October 2006 starting data fi = 2105.007, a = 163.552 and s = -0.1354 
(as shown in Figure |2|). From October to December 2006 the number of games played was 
G = 259, 662, and the number of new players was N = 1960. Using these values together with 
the data in Table ^, we therefore took the number of simulation steps to be N+G = 3, 769, 819 
and, as before, the rate r at which players enter the system to be 0.009553. Tables ^, ^ and 
show the average skewness s, mean rating fl and standard deviation a over 10 simulations, 
for w = 1,2 and 3, respectively, with k varying from to 12. As a reference point, for the 
actual rating data as of January 2010, fl = 2015.650 and a = 209.622, as in Table |, and we 
computed s = -0.2284. 

It can be seen that the results are rather similar in all three tables. As k is increased, 
the skewness s becomes more negative, the mean decreases and the standard deviation a 
increases, as expected. The closest fit to the actual skewness s = —0.2284 and the standard 
deviation a = 209.622 is when k is 8. However, the closest fit to the mean Elo rating 
ju = 2015.650 is when k is 11 or 12. The suggested values for k therefore correspond to a new 
player being rated 160 — 230 Elo points below the average rating. This latter value is in broad 
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Using 


w 


Year 


Q 




a 




P 




a 


Ap,tp 


1 


2007 


77016 


2092.231 


164.476 


0.9993 


77064 


2092.179 


164.890 




1 


2007 


74895 


2103.906 


164.826 


0.9993 


74899 


2104.053 


164.923 


Ap,tp 


2 


2007 


77048 


2096.596 


164.865 


0.9994 


77045 


2096.646 


164.933 




2 


2007 


75206 


2090.258 


164.489 


0.9993 


75233 


2090.195 


164.752 


Ap,tp 


3 


2007 


77201 


2091.796 


164.679 


0.9994 


77228 


2092.199 


164.981 




3 


2007 


75066 


2095.629 


164.533 


0.9993 


75111 


2096.049 


164.893 


Ap,tp 


1 


2008 


87049 


2075.647 


171.938 


0.9994 


87081 


2075.632 


172.178 




1 


2008 


84901 


2077.914 


172.079 


0.9994 


84962 


2078.193 


172.515 


Ap,tp 


2 


2008 


87080 


2084.221 


172.265 


0.9995 


87111 


2083.853 


172.573 




2 


2008 


84829 


2069.993 


172.106 


0.9994 


84843 


2070.036 


172.171 


Ap,tp 


3 


2008 


87073 


2075.797 


172.173 


0.9994 


87144 


2075.890 


172.676 




3 


2008 


84897 


2070.297 


171.903 


0.9994 


84926 


2070.029 


172.086 


Ap,tp 


1 


2009 


99267 


2033.206 


179.812 


0.9995 


99324 


2033.229 


180.227 




1 


2009 


96970 


2036.315 


179.620 


0.9995 


97030 


2036.540 


180.000 


Ap,tp 


2 


2009 


99121 


2037.768 


179.613 


0.9995 


99141 


2037.980 


179.862 




2 


2009 


96985 


2041.688 


179.421 


0.9995 


97043 


2041.843 


179.820 


Ap,tp 


3 


2009 


99149 


2032.080 


179.770 


0.9995 


99166 


2031.835 


179.943 




3 


2009 


97138 


2032.044 


179.542 


0.9994 


97172 


2031.974 


179.804 


Ap, tp 


1 


2010 


109458 


2019.797 


185.313 


0.9995 


109485 


2019.818 


185.528 




1 


2010 


107868 


2026.636 


186.163 


0.9995 


107893 


2026.662 


186.304 


Ap,tp 


2 


2010 


109457 


2014.156 


185.612 


0.9995 


109469 


2013.943 


185.691 




2 


2010 


107847 


2021.331 


185.932 


0.9995 


107860 


2021.283 


186.047 


Ap,tp 


3 


2010 


109373 


2013.765 


185.203 


0.9994 


109386 


2013.833 


185.344 




3 


2010 


107840 


2020.426 


185.611 


0.9995 


107865 


2020.262 


185.806 



Table 4: Actual and fitted parameters for simulation results 

agreement with the value e = 227.6 obtained in Section ^ from Figure |2[ Although this value 
was obtained using the entire three year period, the values for the individual years calculated 
from are similar, being roughly in the range 200 — 300. These results confirm that the 
modified process is a reasonable model for obtaining rating data with the observed parameters, 
despite the discrepancy between the values for k. This discrepancy is not surprising, since 
the modified model, as a first approximation, is clearly an oversimplification. We note that 
the value of w seems to have very little effect on the results, although it is possible that some 
pattern might be noticeable if a significantly larger value for w was used. 

8 Concluding Remarks 

We have constructed a stochastic evolutionary urn model that generates the distribution of 
players' ratings and have validated this model using published official rating data on chess 
players. For the symmetric case, our analysis of the model yielded a Gaussian distribution, 
which has the interesting feature that the variance increases logarithmically with time. This 
implies that the distribution of ratings is quite stable, but has the tendency to flatten ex- 
tremely slowly over time. These results were validated by simulating the model. Although the 
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s 




a 


- 


-0.2284 


2015.650 


209.622 





-0.0920 


2105.641 


186.980 


1 


-0.0884 


2097.907 


187.030 


2 


-0.0933 


2089.894 


188.557 


3 


-0.0995 


2082.068 


190.721 


4 


-0.1064 


2074.321 


193.440 


5 


-0.1276 


2066.435 


197.046 


6 


-0.1541 


2058.940 


201.387 


7 


-0.1881 


2050.446 


206.373 


8 


-0.2309 


2043.175 


211.843 


9 


-0.2745 


2035.115 


218.153 


10 


-0.3206 


2027.404 


224.623 


11 


-0.3742 


2019.250 


232.101 


12 


-0.4227 


2012.005 


239.422 



Table 5: Simulation results allowing skewness, for w = 1 



data is well approximated by a Gaussian, there is a small negative skew present in the data. 
An improvement can be made to the model to account for this by breaking the symmetry 
and putting new players into lower-numbered urns, corresponding to new players generally 
having lower than average ratings. The modified stochastic process was validated by simula- 
tion starting with actual rating data. Deriving analytically the distribution for the modified 
process remains an open problem. 

Throughout the paper we have assumed that the i^'-factor is fixed at 20. It would be 
interesting to allow the JT-factor to vary with players' ratings and the number of games they 
have played, as suggested in [GJ99|, and to see whether such a modification could shed some 
light on the i^-factor controversy mentioned in Section 0. 
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